12 research outputs found

    Characterizing Deep-Learning I/O Workloads in TensorFlow

    Full text link
    The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on CPUs and then moved to accelerator for computation. In addition, checkpointing and restart operations are carried out to allow DL computing frameworks to restart quickly from a checkpoint. Because of this, I/O affects the performance of DL applications. In this work, we characterize the I/O performance and scaling of TensorFlow, an open-source programming framework developed by Google and specifically designed for solving DL problems. To measure TensorFlow I/O performance, we first design a micro-benchmark to measure TensorFlow reads, and then use a TensorFlow mini-application based on AlexNet to measure the performance cost of I/O and checkpointing in TensorFlow. To improve the checkpointing performance, we design and implement a burst buffer. We find that increasing the number of threads increases TensorFlow bandwidth by a maximum of 2.3x and 7.8x on our benchmark environments. The use of the tensorFlow prefetcher results in a complete overlap of computation on accelerator and input pipeline on CPU eliminating the effective cost of I/O on the overall performance. The use of a burst buffer to checkpoint to a fast small capacity storage and copy asynchronously the checkpoints to a slower large capacity storage resulted in a performance improvement of 2.6x with respect to checkpointing directly to slower storage on our benchmark environment.Comment: Accepted for publication at pdsw-DISCS 201

    Flux-tube-dependent propagation of Alfvén waves in the solar corona

    Get PDF
    Context. Alfven-wave turbulence has emerged as an important heating mechanism to accelerate the solar wind. The generation of this turbulent heating is dependent on the presence and subsequent interaction of counter-propagating Alfven waves. This requires us to understand the propagation and evolution of Alfven waves in the solar wind in order to develop an understanding of the relationship between turbulent heating and solar-wind parameters. Aims. We aim to study the response of the solar wind upon injecting monochromatic single-frequency Alfven waves at the base of the corona for various magnetic flux-tube geometries. Methods. We used an ideal magnetohydrodynamic model using an adiabatic equation of state. An Alfven pump wave was injected into the quiet solar wind by perturbing the transverse magnetic field and velocity components. Results. Alfven waves were found to be reflected due to the development of the parametric decay instability (PDI). Further investigation revealed that the PDI was suppressed both by efficient reflections at low frequencies as well as magnetic flux-tube geometries.Peer reviewe

    PolyPIC: the Polymorphic-Particle-in-Cell Method for Fluid-Kinetic Coupling

    Get PDF
    Particle-in-Cell (PIC) methods are widely used computational tools for fluid and kinetic plasma modeling. While both the fluid and kinetic PIC approaches have been successfully used to target either kinetic or fluid simulations, little was done to combine fluid and kinetic particles under the same PIC framework. This work addresses this issue by proposing a new PIC method, PolyPIC, that uses polymorphic computational particles. In this numerical scheme, particles can be either kinetic or fluid, and fluid particles can become kinetic when necessary, e.g. particles undergoing a strong acceleration. We design and implement the PolyPIC method, and test it against the Landau damping of Langmuir and ion acoustic waves, two stream instability and sheath formation. We unify the fluid and kinetic PIC methods under one common framework comprising both fluid and kinetic particles, providing a tool for adaptive fluid-kinetic coupling in plasma simulations.Comment: Submitted to Frontier

    A kinetic perspective of electron trapping near a weakly outgassing comet

    No full text
    The European Space Agency’s Rosetta spacecraft followed the comet 67P/Churyumov-Gerasimenko from August 2014 to September 2016, providingobservations of the comet ionosphere at varying heliocentric distances. Measurementsfrom the Rosetta mission have shown a multitude of non-thermalelectron distributions in the cometary environment, challenging the previouslyassumed origin and plasma interaction mechanisms near a cometary nucleus.In this thesis, we discuss electron trapping near a weakly outgassing cometfrom a fully kinetic (particle-in-cell) perspective which self consistently describethe ambipolar field. Using electromagnetic fields derived from the simulation,we characterize the trajectories of trapped electrons in the potentialwell surrounding the cometary nucleus and identify the distinguishing featuresin their respective velocity and pitch angle distributions. In accordancewith theoretical findings in space plasma, our analysis allows us to define aclear boundary in velocity phase space between the distributions of trappedand passing electrons.Europeiska rymdorganisationens rymdsond Rosetta följde mellan augusti 2014och september 2016 kometen 67P/ Churyumov-Gerasimenko och tillhandahöllobservationser av kometens jonosfär på olika heliocentriska avstånd. Mätningarfrån uppdraget har visat en mängd icke-termiska elektron distributioneri kometens miljö och ifrågasatt det tidigare antagandet om ursprung samt plasmainteraktioner nära kometens kärna. I detta arbete undersöker vi elektronfångstnära en svagt utåtgasande komet från ett helt kinetiskt (partikel-i-cell) perspektivvilket på självkonsistent sätt som kan beskriva det ambipolära elektrisktfältet. Genom att använda de elektromagnetiska fälten som härrör frånvår simulering, karakteriserar vi banorna för infångade elektroner i den potentiellabrunnen som omger kometkärnan. Vi identifierar egenskaper frånderas respektive hastighets- och stigvinkelfördelningar. Vår analys som överensstämmermed teoretiska fynd i rymdplasma tillåter oss att definiera entydlig gräns i hastighetsfasutrymme mellan fördelningarna av infångade ochpassande elektroner

    A kinetic perspective of electron trapping near a weakly outgassing comet

    No full text
    The European Space Agency’s Rosetta spacecraft followed the comet 67P/Churyumov-Gerasimenko from August 2014 to September 2016, providingobservations of the comet ionosphere at varying heliocentric distances. Measurementsfrom the Rosetta mission have shown a multitude of non-thermalelectron distributions in the cometary environment, challenging the previouslyassumed origin and plasma interaction mechanisms near a cometary nucleus.In this thesis, we discuss electron trapping near a weakly outgassing cometfrom a fully kinetic (particle-in-cell) perspective which self consistently describethe ambipolar field. Using electromagnetic fields derived from the simulation,we characterize the trajectories of trapped electrons in the potentialwell surrounding the cometary nucleus and identify the distinguishing featuresin their respective velocity and pitch angle distributions. In accordancewith theoretical findings in space plasma, our analysis allows us to define aclear boundary in velocity phase space between the distributions of trappedand passing electrons.Europeiska rymdorganisationens rymdsond Rosetta följde mellan augusti 2014och september 2016 kometen 67P/ Churyumov-Gerasimenko och tillhandahöllobservationser av kometens jonosfär på olika heliocentriska avstånd. Mätningarfrån uppdraget har visat en mängd icke-termiska elektron distributioneri kometens miljö och ifrågasatt det tidigare antagandet om ursprung samt plasmainteraktioner nära kometens kärna. I detta arbete undersöker vi elektronfångstnära en svagt utåtgasande komet från ett helt kinetiskt (partikel-i-cell) perspektivvilket på självkonsistent sätt som kan beskriva det ambipolära elektrisktfältet. Genom att använda de elektromagnetiska fälten som härrör frånvår simulering, karakteriserar vi banorna för infångade elektroner i den potentiellabrunnen som omger kometkärnan. Vi identifierar egenskaper frånderas respektive hastighets- och stigvinkelfördelningar. Vår analys som överensstämmermed teoretiska fynd i rymdplasma tillåter oss att definiera entydlig gräns i hastighetsfasutrymme mellan fördelningarna av infångade ochpassande elektroner

    A kinetic perspective of electron trapping near a weakly outgassing comet

    No full text
    The European Space Agency’s Rosetta spacecraft followed the comet 67P/Churyumov-Gerasimenko from August 2014 to September 2016, providingobservations of the comet ionosphere at varying heliocentric distances. Measurementsfrom the Rosetta mission have shown a multitude of non-thermalelectron distributions in the cometary environment, challenging the previouslyassumed origin and plasma interaction mechanisms near a cometary nucleus.In this thesis, we discuss electron trapping near a weakly outgassing cometfrom a fully kinetic (particle-in-cell) perspective which self consistently describethe ambipolar field. Using electromagnetic fields derived from the simulation,we characterize the trajectories of trapped electrons in the potentialwell surrounding the cometary nucleus and identify the distinguishing featuresin their respective velocity and pitch angle distributions. In accordancewith theoretical findings in space plasma, our analysis allows us to define aclear boundary in velocity phase space between the distributions of trappedand passing electrons.Europeiska rymdorganisationens rymdsond Rosetta följde mellan augusti 2014och september 2016 kometen 67P/ Churyumov-Gerasimenko och tillhandahöllobservationser av kometens jonosfär på olika heliocentriska avstånd. Mätningarfrån uppdraget har visat en mängd icke-termiska elektron distributioneri kometens miljö och ifrågasatt det tidigare antagandet om ursprung samt plasmainteraktioner nära kometens kärna. I detta arbete undersöker vi elektronfångstnära en svagt utåtgasande komet från ett helt kinetiskt (partikel-i-cell) perspektivvilket på självkonsistent sätt som kan beskriva det ambipolära elektrisktfältet. Genom att använda de elektromagnetiska fälten som härrör frånvår simulering, karakteriserar vi banorna för infångade elektroner i den potentiellabrunnen som omger kometkärnan. Vi identifierar egenskaper frånderas respektive hastighets- och stigvinkelfördelningar. Vår analys som överensstämmermed teoretiska fynd i rymdplasma tillåter oss att definiera entydlig gräns i hastighetsfasutrymme mellan fördelningarna av infångade ochpassande elektroner

    PolyPIC: The Polymorphic-Particle-in-Cell Method for Fluid-Kinetic Coupling

    No full text
    © 2018 Markidis, Olshevsky, Sishtla, Chien, Laure and Lapenta. Particle-in-Cell (PIC) methods are widely used computational tools for fluid and kinetic plasma modeling. While both the fluid and kinetic PIC approaches have been successfully used to target either kinetic or fluid simulations, little was done to combine fluid and kinetic particles under the same PIC framework. This work addresses this issue by proposing a new PIC method, PolyPIC, that uses polymorphic computational particles. In this numerical scheme, particles can be either kinetic or fluid, and fluid particles can become kinetic when necessary, e.g., particles undergoing a strong acceleration. We design and implement the PolyPIC method, and test it against the Landau damping of Langmuir and ion acoustic waves, two stream instability and sheath formation. We unify the fluid and kinetic PIC methods under one common framework comprising both fluid and kinetic particles, providing a tool for adaptive fluid-kinetic coupling in plasma simulations.keywords: Physics - Computational Physics, Physics - Fluid Dynamics, Physics - Plasma Physics eid: arXiv:1807.05183 archiveprefix: arXiv primaryclass: physics.comp-ph adsurl: https://ui.adsabs.harvard.edu/#abs/2018arXiv180705183M adsnote: Provided by the SAO/NASA Astrophysics Data Systemstatus: publishe

    An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems

    No full text
    Computational intensive applications such as pattern recognition, and natural language processing, are increasingly popular on HPC systems. Many of these applications use deep-learning, a branch of machine learning, to determine the weights of artificial neural network nodes by minimizing a loss function. Such applications depend heavily on dense matrix multiplications, also called tensorial operations. The use of Graphics Processing Unit (GPU) has considerably speeded up deep-learning computations, leading to a Renaissance of the artificial neural network. Recently, the NVIDIA Volta GPU and the Google Tensor Processing Unit (TPU) have been specially designed to support deep-learning workloads. New programming models have also emerged for convenient expression of tensorial operations and deep-learning computational paradigms. An example of such new programming frameworks is TensorFlow, an open-source deep-learning library released by Google in 2015. TensorFlow expresses algorithms as a computational graph where nodes represent operations and edges between nodes represent data flow. Multi-dimensional data such as vectors and matrices which flows between operations are called Tensors. For this reason, computation problems need to be expressed as a computational graph. In particular, TensorFlow supports distributed computation with flexible assignment of operation and data to devices such as GPU and CPU on different computing nodes. Computation on devices are based on optimized kernels such as MKL, Eigen and cuBLAS. Inter-node communication can be through TCP and RDMA. This work attempts to evaluate the usability and expressiveness of the TensorFlow programming model for traditional HPC problems. As an illustration, we prototyped a distributed block matrix multiplication for large dense matrices which cannot be co-located on a single device and a Conjugate Gradient (CG) solver. We evaluate the difficulty of expressing traditional HPC algorithms using computational graphs and study the scalability of distributed TensorFlow on accelerated systems. Our preliminary result with distributed matrix multiplication shows that distributed computation on TensorFlow is extremely scalable. This study provides an initial investigation of new emerging programming models for HPC.Published in Proceedings of the 5th International Conference on Exascale Applications and Software. Edinburgh: The University of Edinburgh (2018), ISBN: 978-0-9926615-3-3, pp.34, Published under license CC BY-ND 4.0.</p

    An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems

    No full text
    Computational intensive applications such as pattern recognition, and natural language processing, are increasingly popular on HPC systems. Many of these applications use deep-learning, a branch of machine learning, to determine the weights of artificial neural network nodes by minimizing a loss function. Such applications depend heavily on dense matrix multiplications, also called tensorial operations. The use of Graphics Processing Unit (GPU) has considerably speeded up deep-learning computations, leading to a Renaissance of the artificial neural network. Recently, the NVIDIA Volta GPU and the Google Tensor Processing Unit (TPU) have been specially designed to support deep-learning workloads. New programming models have also emerged for convenient expression of tensorial operations and deep-learning computational paradigms. An example of such new programming frameworks is TensorFlow, an open-source deep-learning library released by Google in 2015. TensorFlow expresses algorithms as a computational graph where nodes represent operations and edges between nodes represent data flow. Multi-dimensional data such as vectors and matrices which flows between operations are called Tensors. For this reason, computation problems need to be expressed as a computational graph. In particular, TensorFlow supports distributed computation with flexible assignment of operation and data to devices such as GPU and CPU on different computing nodes. Computation on devices are based on optimized kernels such as MKL, Eigen and cuBLAS. Inter-node communication can be through TCP and RDMA. This work attempts to evaluate the usability and expressiveness of the TensorFlow programming model for traditional HPC problems. As an illustration, we prototyped a distributed block matrix multiplication for large dense matrices which cannot be co-located on a single device and a Conjugate Gradient (CG) solver. We evaluate the difficulty of expressing traditional HPC algorithms using computational graphs and study the scalability of distributed TensorFlow on accelerated systems. Our preliminary result with distributed matrix multiplication shows that distributed computation on TensorFlow is extremely scalable. This study provides an initial investigation of new emerging programming models for HPC.Published in Proceedings of the 5th International Conference on Exascale Applications and Software. Edinburgh: The University of Edinburgh (2018), ISBN: 978-0-9926615-3-3, pp.34, Published under license CC BY-ND 4.0.</p
    corecore